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(57) Abstract 

The results of experiments aimed at detecting polymorphisms and mutations in the BRCA1 promoter region as well as comparisons 
of two published DNA sequences indicated dial two similar but distinct copies of mis region exist in the human genome. PCR primers 
specific for amplification of each of the two promoter regions were isolated from rearrangement-resistant libraries. Sequence analysis of 
the clones and specific PCR products reveals two similar genomic rearrangements of head-to-head genes. The BRCA1 gene is closely 
apposed to a gene structure that is similar but not identical to 1 A13B and the 1A13B gene is apposed to a gene structure that has strong 
similarity to BRCA1 but also has significant differences. The features of the BRCA1 and 1AI3B promoter region are shown in the Figure. 
STS analysis of YAC and PI clones located in the vicinity of BRCA1 indicates mat these similar promoter regions are elements of a 
direct duplication. New hypotheses for genetic mechanisms that may be involved in breast and ovarian cancer etiology are raised by the 
identification of this duplicated genetic structure on chromosome 17q. Also presented are polymorphisms in the duplicated genes which 
polymorphisms are useful in tracking chromosomal rearrangement of these genes. 
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TITLE OF THE INVENTION 

THE BRCA1 AND 1A1.3B PROMOTERS ARE PARALLEL ELEMENTS OF A 
GENOMIC DUPLICATION AT 17Q21 

5 This application was made with Government support under Grant Nos. NCI 

CA63689 and UCI sub-contract J92CA56554 (from NCI CA58660) and NIH CA42014, 
funded by the National Institutes of Health, Bethesda, Maryland. The United States 
Government has certain rights in the invention, 

10 FIELD OF THE INVENTION 



The present invention relates generally to the field of human genetics. 
Specifically, the present invention relates to a gene, named LBRCA1 (which stands for 
"Like BRCA1"), which is very similar to a human breast and ovarian cancer predisposing 

15 gene (BRCA1), some mutant alleles of which cause susceptibility to cancer. The 
invention also relates to a gene called 1A1.3B and a very similar gene named L1A1.3B 
(for Like 1A1.3B). L1A1.3B is located extremely close to BRCA1 in a head to head 
configuration while LBRCA1 and 1AL3B are similarly located very close to each other 
also in a head to head arrangement, wherein genes that have 5* ends located immediately 

20 adjacent to one another are said to be "head-to-head". The BRCA1/L1A1.3B and 
LBRCA1/1A1.3B regions are a result of gene duplication. Knowledge of the LBRCA1 
sequence is important for the analysis of BRCA1 for mutations because the very high 
similarity between the two genes could lead to problems when trying to analyze BRCA1. 
Extensive testing of persons for mutations in BRCA1 is expected to begin very soon. The 

25 LBRCA1 and L1A1.3B contain promoter regions similar to the promoters for BRCA1 
and 1A1.3B. These additional promoters, which are in close proximity to the BRCA1 and 
1A1.3B genes, may affect transcription of these latter genes. 

A further aspect of the present invention is that the knowledge of the 
chromosomal arrangement of these genes and the fact that there has been a gene 

30 duplication, is useful in looking for mutations, other than mutations directly within 
BRCA1, which could affect proper transcription of BRCA1 and may be responsible for 
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breast or ovarian cancer. Another aspect of the invention is that polymorphisms in or near 
LBRCA1 and L1A1.3B have been found and these are useful in tracking the 
chromosomal arrangement of these genes as well as BRCA1 and 1A1.3B to determine 
whether rearrangement has occurred. 
5 The publications and other materials used herein to illuminate the background of 

the invention, and in particular, cases to provide additional details respecting the practice, 
are incorporated herein by reference, and for convenience, are referenced by author and 
date in the following text and respectively grouped in the appended List of References. 



10 BACKGROUND OF THK TNVKMTrON 

The genetics of cancer is complicated, involving multiple dominant, positive 
regulators of the transformed state (oncogenes) as well as multiple recessive, negative 
regulators (tumor suppressor genes). Over one hundred oncogenes have been characterized. 
Fewer than a dozen tumor suppressor genes have been identified, but the number is 

1 5 expected to increase beyond fifty (Knudson, 1993). 

The involvement of so many genes underscores the complexity of the growth control 
mechanisms that operate in cells to maintain the integrity of normal tissue. This complexity 
is manifest in another way. So far, no single gene has been shown to participate in the 
development of all, or even the majority of human cancers. The most common oncogenic 

20 mutations are in the H-ras gene, which is found in 10-15% of all solid tumors (Anderson et 
al. 9 1992). The most frequently mutated tumor suppressor genes are the TP53 gene, 
homozygously deleted in roughly 50% of all tumors, and CDKN2, which was 
homozygously deleted in 46% of tumor cell lines examined (Kamb et al. 9 1994). Without a 
target that is common to all transformed cells, the dream of a "magic bullet" that can destroy 

25 or revert cancer cells while leaving normal tissue unharmed is improbable. The hope for a 
new generation of specifically targeted antitumor drugs may rest on the ability to identify 
tumor suppressor genes or oncogenes that play general roles in control of cell divisioa 

The tumor suppressor genes which have been cloned and characterized influence 
susceptibility to: 1) Retinoblastoma (RBI); 2) Wilms 1 tumor (WT1); 3) Li-Fraumeni 

30 (TP53); 4) Familial adenomatous polyposis (APC); 5) Neurofibromatosis type 1 (NF1); 6) 
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Neurofibromatosis type 2 (NF2); 7) von Hippel-Lindau syndrome (VHL); 8) Multiple 
endocrine neoplasia type 2A (MEN2A); and 9) Melanoma (CDKN2). 

Tumor suppressor loci that have been mapped genetically but not yet isolated include 
genes for: Multiple endocrine neoplasia type 1 (MEN1); Lynch cancer family syndrome 2 
5 (LCFS2); Neuroblastoma (NB); Basal cell nevus syndrome (BCNS); Beckwith- Wiedemann 
syndrome (BWS); Renal cell carcinoma (RCC); Tuberous sclerosis 1 (TSC1); and Tuberous 
sclerosis 2 (TSC2). The tumor suppressor genes that have been characterized to date encode 
products with similarities to a variety of protein types, including DNA binding proteins 
(WT1), ancillary transcription regulators (RBI), GTPase activating proteins or GAPs (NF1), 
10 cytoskeletal components (NF2), membrane bound receptor kinases (MEN2A), cell cycle 
regulators (CDKN2) and others with no obvious similarity to known proteins (APC and 
VHL). 

In many cases, the tumor suppressor gene originally identified through genetic studies 
has been shown to be lost or mutated in some sporadic tumors. This result suggests that 

1 5 regions of chromosomal aberration may signify the position of important tumor suppressor 
genes involved both in genetic predisposition to cancer and in sporadic cancer. 

One of the hallmarks of several tumor suppressor genes characterized to date is that 
they are deleted at high frequency in certain tumor types. The deletions often involve loss 
of a single allele, a so-called loss of heterozygosity (LOH), but may also involve 

20 homozygous deletion of both alleles. For LOH, the remaining allele is presumed to be 
nonfunctional, either because of a preexisting inherited mutation, or because of a secondary 
sporadic mutation. 

Two genes that are predisposing for breast cancer have recently been cloned and 
characterized. These are BRCA1 (Mild et aL, 1994; Futreal et aL, 1994) and BRCA2 

25 (Wooster et aL, 1995; Tavtigian et aL, 1996). Breast cancer is one of the most significant 
diseases that affects women. At the current rate, American women have a 1 in 8 risk of 
developing breast cancer by age 95 (American Cancer Society, 1992). Treatment of breast 
cancer at later stages is often futile and disfiguring, making early detection a high priority in 
medical management of the disease. Ovarian cancer, although less frequent than breast 

30 cancer is often rapidly fatal and is the fourth most common cause of cancer mortality in 
American women. Genetic factors contribute to an ill-defined proportion of breast cancer 
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incidence, estimated to be about 5% of all cases but approximately 25% of cases diagnosed 
before age 40 (Claus et aL, 1991). Breast cancer has been subdivided into two types, 
early-age onset and late-age onset, based on an inflection in the age-specific incidence curve 
around age 50. Mutation of one gene, BRCA1, is thought to account for approximately 
5 45% of familial breast cancer, but at least 80% of families with both breast and ovarian 
cancer (Easton et aL , 1 993). 

There were intense efforts to isolate the BRCA1 gene after it was first mapped in 
1990 (Hall et al,, 1990; Narod et al., 1991). A second locus, BRCA2, was mapped to 
chromosome 13q (Wooster et al, 1994) and appears to account for a proportion of early- 

10 onset breast cancer roughly equal to BRCA1, but confers a lower risk of ovarian cancer. 
The remaining susceptibility to early-onset breast cancer is divided between as yet 
unmapped genes for familial cancer, and rarer germline mutations in genes such as TP53 
(Malkin et aL, 1990). It has also been suggested that heterozygote carriers for defective 
forms of the Ataxia-Telangectasia gene are at higher risk for breast cancer (Swift et aL, 

15 1976; Swift et aL, 1991). Late-age onset breast cancer is also often familial although the 
risks in relatives are not as high as those for early-onset breast cancer (Cannon- Albright et 
aL, 1994; Mettlin et aL, 1990). However, the percentage of such cases due to genetic 
susceptibility is unknown. 

Breast cancer has long been recognized to be, in part, a familial disease (Anderson, 

20 1972). Numerous investigators have examined the evidence for genetic inheritance and 
concluded that the data are most consistent with dominant inheritance for a major 
susceptibility locus or loci (Bishop and Gardner, 1980; Go et aL, 1983; Williams and 
Anderson, 1984; Bishop et aL, 1988; Newman et al., 1988; Claus et al., 1991). Early results 
demonstrated that at least three loci exist which convey susceptibility to breast cancer as 

25 well as other cancers. These loci are the TP53 locus on chromosome 17p (Malkin et aL, 
1990), a 17q-linked susceptibility locus known as BRCA1 (Hall et aL, 1990), and one or 
more loci responsible for the unmapped residual. As noted above, the BRCA1 and BRCA2 
genes have recently been identified. These are located on chromosomes 17q and 13q, 
respectively: Hall et al. (1990) indicated that the inherited breast cancer susceptibility in 

30 kindreds with early age onset is linked to chromosome 17q21; although subsequent studies 
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by this group using a more appropriate genetic model partially refuted the limitation to early 
onset breast cancer (Margaritte et ah , 1 992). 

The simplest model for the functional role of BRCA1 holds that alleles of BRCA1 
that predispose to cancer are recessive to wild type alleles; that is, cells that contain at least 
5 one wild type BRCA1 allele are not cancerous. However, cells that contain one wild type 
BRCA1 allele and one predisposing allele may occasionally suffer loss of the wild type 
allele either by random mutation or by chromosome loss during cell division 
(nondisjunction). All the progeny of such a mutant cell lack the wild type function of 
BRCA1 and may develop into tumors. According to this model, predisposing alleles of 

10 BRCA1 are recessive, yet susceptibility to cancer is inherited in a dominant fashion: 
women who possess one predisposing allele (and one wild type allele) risk developing 
cancer, because their mammary epithelial cells may spontaneously lose the wild type 
BRCA1 allele. This model applies to a group of cancer susceptibility loci known as tumor 
suppressors or antioncogenes, a class of genes that includes the retinoblastoma gene and 

15 neurofibromatosis gene. By inference this model may also explain the BRCA1 function, as 
has recently been suggested (Smith et al , 1 992). 

A second possibility is that BRCA1 predisposing alleles are truly dominant; that is, a 
wild type allele of BRCA1 cannot overcome the tumor forming role of the predisposing 
allele. Thus, a cell that carries both wild type and mutant alleles would not necessarily lose 

20 the wild type copy of BRCA1 before giving rise to malignant cells. Instead, mammary cells 
in predisposed individuals would undergo some other stochastic change(s) leading to 
cancer. 

If BRCA1 predisposing alleles are recessive, the BRCA1 gene is expected to be 
expressed in normal mammary tissue but not functionally expressed in mammary tumors. 
25 In contrast, if BRCA1 predisposing alleles are dominant, the wild type BRCA1 gene may or 
may not be expressed in normal mammary tissue. However, the predisposing allele will 
likely be expressed in breast tumor cells. 

Identification of a breast cancer susceptibility locus permits the early detection of 
. susceptible-individuals and greatly increases our ability to understand the initial steps that 
30 lead to cancer. As susceptibility loci are often altered during tumor progression, cloning 
these genes could also be important in the development of better diagnostic and prognostic 
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products, as well as better cancer therapies. Knowledge of specific mutations in the BRCA1 
and BRCA2 genes, which are predisposing toward breast and/or ovarian cancer, has already 
led to screening of patients for these mutations. Knowledge that there is a duplication of a 
part of the BRCA1 gene in the chromosome, this duplication being named LBRCA1, is 
5 important for accurate testing. The high similarity between BRCA1 and LBRCA1 could 
lead to erroneous results or could confound the testing procedure. Knowledge of the 
sequence and location of LBRCA1 will enable one to avoid these problems in testing. 

BRCA1 is located very near to LI Al .3B, which is a partial duplication of the 1 A1.3B 
gene, which is located very near to LBRCA1. The L1A1.3B lies head to head within 250 

10 base pairs of BRCA1. The overlapping of regulatory regions for the two genes may be of 
importance in coordinate control of the two genes. The presence of a duplication 
containing all or part of BRCA1 and 1A1.3B suggests that recombination events or other 
homology-mediated genetic rearrangements, occurring somatically or as heritable 
changes, could result in altered expression or inactivation of genes located within or close 

15 to the duplicated segment, including, but not limited to, the BRCA1 and 1 AL3B genes. 

Finally, polymorphisms have been found in LBRCA1 and the BRCA1 promoter 
region. These will be useful in characterizing possible mutations in LBRCA1 and will 
also be useful for "diagnosing" chromosomal rearrangements involving LBRCA1. This 
is important because with other genes it has been shown that duplication of a segment of 

20 human DNA results in a predisposition to genomic rearrangements that are associated 
with disease. Such a mechanism may also occur with BRCA1 and such rearrangements 
may be responsible for causing cancer rather than, e.g., a missense or nonsense mutation 
within the gene. This mechanism may be important either to cause heritable defects or to 
create gene defects during the somatic growth of cells that carry no inherited defect 

25 

S1JMMARY OF THE INVENTION 

The present invention relates generally to the field of human genetics. 
Specifically, the present invention relates to the duplication of a portion of human 
chromosome 17q containing the breast cancer gene BRCA1. The invention relates to the 
30 chromosomal arrangement and sequence similarities of BRCA1-L1A1.3B and LBRCA- 
1A1.3B. This invention further relates to LBRCAl polymorphisms and BRCA1 
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promoter region polymorphisms that are useful in analyzing whether genomic 
rearrangements have occurred and the usefulness of this in the diagnosis and prognosis of 
human breast and ovarian cancer. 

5 BRIEF DESCRIPTION OF THE DR AWINGS 

Figure 1A is a summary of chromosomal localization of specific PCR products 
and restriction mapping of genomic clones for the BRCA1 promoter and its cognate. The 
regions of chromosome 17 contained in the rodent human hybrids ND-1 and MH-41 are 
indicated by solid horizontal lines (vanTuinen et aL, 1987). The inferred relative 

10 locations and sizes of the genomic EcoRI fragments corresponding to those in CH40 
clones 10A and 16C are shown. A conserved central EcoRI site is located between the 
most 5' exons of the head-to-head gene arrangements and is marked by a double-thick 
hash. The 16C clone was first identified as containing BRCA1 -specific sequences 
because it showed greater hybridization to the oligonucleotide 1007-5 than the 10A clone, 

15 indicated here by heavy and light arrows, as described. 

Figure IB shows an STS analysis of YAC and PI clones previously mapped to the 
BRCA1 region (Albertsen et aL, 1994, Neuhausen et aL, 1994) and the 10A and 16C 
clones described elsewhere in the specification. PCR primer combinations are as 
20 described in Table 1 or elsewhere in the disclosure. 

Figure 2 is a summary of features of the BRCA1 and 1A1.3B promoter region 
sequences. Known exons of these two genes are indicated as solid boxes and the 
corresponding regions of the putative cognate genes with the same apparent intron-exon 

25 boundaries are indicated as checkered boxes. The largest segments unique to each 
sequence are indicated as open triangles, with the length indicated in bp. The EcoRI site 
marked is the central EcoRI site noted in Figure 1. The position of the BRCA1 major 
translation product start site is indicated in exon 2 and a similar indication is shown for a 
possible translation start site in the corresponding segment of LBRCA1. The approximate 

30 locations of oligonucleotide primer sequences described in Table 1 are shown. The open 
boxes at positions 1 50 and 525 on the basepair scale represent polymorphisms detected in 
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the BRCA1 promoter. The narrow box indicates a base difference, C or T, and the wide 
box indicates a trinucleotide difference, AAC or AACAAC. In U37574 (shown in the 
Sequence Listing as SEQ ID NO:l), these correspond to nucleotide positions 612, which 
is C in U37574 and 980-982 where U37574 contains a single trinucleotide element 
5 These polymorphisms are in apparent strong linkage disequilibrium, with the C/AAC 
haplotype having a frequency of 0.65 on 190 tested chromosomes. 



Figure 3 is a dot-plot comparison of U37574 with the sequence derived from genomic 
clone 10A (Genbank accession U72483 (shown in the Sequence Listing as SEQ ID 
10 NO:2)). For this comparison a window size of 15 and a match criterion of 12 were used. 
The positions of the known and comparable exon structures for BRCA1, LBRCA1, 
1A1.3B and LI A1.3B are marked along the axes in the same format as shown in Figure 2. 
The significant gaps representing the largest differences between the sequences discussed 
in the text are indicated. 

15 

DETAILED DESCRIPTION OF THE INVENTION 



The present invention relates generally to the field of human genetics. Specifically, 
the present invention relates to a partial duplication of a human breast cancer predisposing 

20 gene (BRCA1), and some polymorphic allelic forms which can be useful in tracking the 
chromosomal arrangement of BRCA1 . The invention also relates to the fact that there can 
be mutations in the LBRCA1 and LI Al .3B promoter regions that can affect transcription of 
the BRCA1 and 1A1.3B genes. 

Mutations in the BRCA1 gene are associated with a highly increased risk of breast 

25 or ovarian cancer development, and inheritance of defective forms of this gene may 
account for approximately 5% of breast cancer cases. Altered expression or effective loss 
of function of BRCA1 is likely to be important in sporadic breast and ovarian tumors as 
well (Chen et al., 1995; Holt et al., 1996). Although a complete genomic structure of 
BRCA1 is-not yet available, a complete coding region cDNA sequence of BRCA1 has 

30 been reported (Miki et al., 1994). The cDNA structure was further elucidated in a report 
characterizing the promoter region of BRCA1 and describing the alternative use of exons 
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la or lb in different tissue types (Xu et ah, 1995). An additional complexity of BRCA1 
transcription was noted by Brown et al. (1994) who provided evidence that the 1A1.3B 
gene identified by Campbell et al. (1994) and believed to encode the CA125 ovarian 
cancer marker antigen, is transcribed in a head-to-head fashion with BRCA1, with the 5' 
5 most exons of each gene located at a distance of just 295 bp. 

It is shown here that the BRCA1 and 1 A1.3B promoter regions are highly similar 
but represent distinct copies of a genomic duplication. One copy includes the head-to- 
head arrangement of the 1A1.3B gene and a putative gene with 5 f sequences similar to 
BRCA1, referred to here as LBRCA1 (for Like BRCA1). The second promoter region 

10 has a head-to-head arrangement of BRCA1 with a putative gene L1A1.3B (for Like 
1AL3B) that has a 5* structure similar to 1A1.3B. This view is supported by analysis of 
genomic PCR products specific for each promoter region and of genomic clones that have 
been propagated in recombination-deficient conditions. There is a high degree of 
similarity of the two sequences, but also significant differences, consistent with functional 

15 divergence since the time of the duplication event. New hypotheses regarding 
mechanisms of breast and ovarian cancer etiology involving the newly recognized genetic 
structures and putative genes are presented. 

The data presented here demonstrate the existence of a direct genomic duplication 
that includes the BRCA1 and 1A1.3B promoters as distinct elements. The alternative 

20 forms of the duplication do not represent polymorphic variation because PCR reactions 
with primers specific for each distinct segment showed products of the correct size with 
all genomic samples (N > 90) and sequencing of such products showed the expected 
single pattern (data not shown). This finding has a wide variety of implications in part 
because it significantly revises a generally accepted (Szabo and King, 1995) and 

25 frequently cited aspect of BRCA1 gene structure. 

The possible expression of LBRCA1 and LI A1.3B genes that include homologies 
to BRCA1 and 1A1.3B throughout all or part of their length could pose previously 
unrecognized difficulties for the development of specific antibodies and probes for precise 
study of 'gene expression and function. Conflicting and apparently inconsistent 

30 immunohistochemical data have been observed for both 1A1.3B (Campbell et al., 1994) 
and BRCA1 (Scully et al., 1996; Chen et al., 1996). It is also very likely that DNA and 

1 
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RNA hybridization results obtained to date with genomic and cDNA probes for 1A1.3B 
and BRCA1 require some review. In some circumstances, evaluation of specific 
expression of these loci may depend on RT-PCR analysis with specific primers or the 
identification and verification of specific hybridization probes. More complete genomic 
5 structural characterization and transcription analysis are needed to determine the 
expression pattern and nature of any gene products from these loci. 

Searches for mutations affecting BRCA1 transcription initiation must be carried 
out using primers and PCR conditions that are completely specific for amplification of the 
BRCA1 promoter region. The most significant published effort to screen for such 

10 mutations (Friedmann et al., 1995) relied on primers designed from what is now 
recognized as primarily the 1A1.3B and LBRCA1 promoter sequences. Since that 
strategy failed to reveal the common polymorphisms that we have detected, the primer set 
used could not have provided fully sensitive mutation screening coverage for the BRCA1 
promoter region. This indicates the need for renewed experimental approaches to analysis 

15 of the promoter for patients with "inferred regulatory" mutations (Gayther et al., 1995). 
With respect to the coding regions of BRCA1, the possibility may exist that some 
genomic mutations assigned to BRCA1 actually reside in LBRCA1 if genomic or RT- 
PCR primer pairs thought to be specific for BRCA1 also amplify an identical sequence 
from LBRCA1. The overall sequence similarity of -94%, observed in the promoter 

20 regions of the two genes suggests that such confusion is not likely if this degree of 
similarity is representative of the entire duplication. However, knowledge of the 
sequences of the two similar regions will be useful in the design of PCR primers needed 
for amplification of products specific for each region of the duplication. 

The finding that L1A1.3B and not 1A1.3B is located head-to-head with BRCA1 

25 may imply a coordinate regulation and that the putative L1A1.3B gene/transcript shares a 
greater functional interaction or a greater developmental and tissue-specific coordination 
of expression with BRCA1 than does 1A1.3B. Therefore, mutations in L1A1.3B could 
account for some instances of familial breast-ovarian cancer genetically linked to the 
BRCA1 locus, but without any known mutation yet identified in the BRCA1 gene. 

30 A second gene involved in both sporadic and familial ovarian cancers that is distal 

to BRCA1 has been inferred by loss of heterozygosity (LOH) studies (Godwin et al,, 
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1994). Genetic linkage to 17q21 for families with a site-specific ovarian cancer 
susceptibility has also been established (Steichen-Gersdorf et aL, 1994). The newly 
identified gene/promoter complex does lie distal to the BRCA1 gene at a close but not yet 
defined distance, suggesting that observations of specific instances of LOH in ovarian 
5 tumors that do not involve BRCA1 could involve this second locus. Some inherited site- 
specific ovarian cancer may also be due to mutations in the newly identified genes or 
promoter segments. 

The presence of a duplication containing all or part of BRCA1 and 1A1.3B 
suggests that recombination events or other homology-mediated genetic rearrangements, 

10 occurring somatically or as heritable changes, could result in altered expression or 
inactivation of genes located within or close to the duplicated segment Examples of such 
mechanisms include unequal exchanges resulting in the formation of chimeric genes with 
inappropriate expression or function (Lifton et aL, 1992) and deletion or gene conversion 
events between highly similar gene sequences that result in non-functional arrangements 

15 (White et aL, 1988). The alternative possibilities of duplication or deletion of one or more 
genes lying between sites of homology-mediated unequal exchange may also be involved 
in disease etiology. An example of this is the PMP22 gene, located in proximal 17p 
between two homologous 24 kb elements that are separated by 1.5 Mb (Kiyosawa and 
Chance, 1996). Unequal exchange between these elements can cause a duplication of 

20 PMP22, resulting in Charcot-Marie-Tooth disease Type I (Pentao et aL, 1992) or a 
deletion that causes hereditary neuropathy with liability to pressure palsies (Chance et aL, 
1993). 

Inversions caused by recombination between homologous 9.5 kb segments located 
250-350 kb apart and in opposite orientation on the same chromosome are responsible for 

25 almost 50% of the mutations in FVIII (Naylor et aL, 1993; Lakich et aL, 1993; Naylor et 
aL, 1995). It is notable that the FVIII gene in patients with hemophilia A was scrutinized 
for 8 years before this common mutation mechanism was detected. As was the case for 
FVIII, it is possible that large scale inversion, duplication or deletion mutations involving 
the 1A1.3B/LBRCA1 and L1A1.3B/BRCA1 segments have been missed by 

30 investigations to date. This is particularly true for the evaluation of tumor material, where 
appropriate DNA specimens for analyses of very long fragments are usually unavailable 
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and the relevant detection methods are rarely applied. Mutation studies of individual 
exons or even large cDNA segments would not result in identification of such changes, 
because at least one normal gene copy would be present in cases of a large scale 
rearrangement affecting one chromosome. Furthermore, PCR-based detection strategies 
5 that do not specifically anticipate changes in the order or orientation of gene segments are 
insensitive to such changes. Further elucidation of the complete genomic structure of the 
duplication described here and development of appropriate detection methods will reveal 
the contribution of specific long-range chromosomal rearrangements to the burden of 
somatic genetic events causing sporadic cancer cases as well as inherited defects. 

10 

RK-SSCP 

Scanning of long PCR product fragments for DNA sequence variation was carried 
out by methods similar to those described previously (Liu and Sommer, 1995), except that 
gel-purified PCR products were uniformly labeled by 12 cycles of reamplification with 
15 the same or internal PCR primers in the presence of alpha- 33 P-dNTPs and were then 
digested with a series of appropriate restriction endonucleases before application to 0.5 X 
MDE gels (FMC Bioproducts) for detection of SSCP and heteroduplex variants. 

Genomic library screening 

20 The LANL1701 flow-sorted chromosome 17 library (Longmire et al., 1993) was 

provided by L. Deaven at the Los Alamos National Laboratory. The vector, lambda 
CH40, grows in recA* bacteria, and the library has been propagated on the K802 recA" 
host to significantly reduce the possibility of intra- or inter-clone recombination events 
that might result in artifactual fusions. To screen this library, PCR of DNA from library 

25 subpools was used to verify the presence of appropriate clones, followed by plaque 
hybridization. Standard methods were used for phage clone growth, DNA extraction, 
restriction digestion and construction of pUC8 plasmid derivatives containing each of the 
EcoRI fragments of each CH40 clone for further hybridization analysis and DNA 
sequencingr 

30 



I* 
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Oligonucleotide hybridization 

Plasmid and phage clone DNAs were denatured with NaOH and applied to 
charged nylon filters (AMF CUNO), followed by air-drying, UV crosslinking and filter 
pre-washing in 0.5% SDS, 0.1 x SSC at 65°C. Oligonucleotide probes were 32 P-labeled 
5 with T4 kinase and hybridized to replicate filters in 6 x SSC, 5 mM EDTA pH 8.0, 0.25% 
non-fat dry milk for at least two hours at 37°C, followed by three successive 5 minute 
washes in pre-warmed 5 x SSC, 0.1% SDS at a temperature 10°C lower than the 
oligonucleotide T m , as calculated by the PRIMER program. Filters were blotted dry and 
exposed to X-ray film for 1 to 16 days with an intensifier screen. The sequences of the 
1 0 oligonucleotides used are as shown in Table 1 . 



Sequencing and sequence analysis 

Manual cycle-sequencing of clones and PCR products was carried out as described 
by Adams and Blakesley (1991). PCR products were purified from low melting agarose 
15 using Promega Wizards columns. Sequencher 2.0 or 3.0 software (Genecodes) was used 
to generate restriction maps of known sequences, for assembly of manual sequencing data 
and comparison of related sequences. DottyPlotter 1.0c software (BIONET, D.G. Gilbert, 

1989) was used for comparison of sequences at different "stringencies" by dot plot 
analysis. PRIMER 0.5 (Whitehead Institute 1991, Hudson et aL, 1992) was used for PCR 

20 primer analysis and design. Sequence similarity searches of Genbank and EMBL 
sequence databases were conducted using the BLAST suite of programs (Altschul et al., 

1990) supported by the National Center for Biotechnology Information. 



Chromosomal localization. STS analysis of genomic YAC and PI clones 
25 PCR reactions for chromosome localization and sub-localization using rodent- 

human hybrid DNAs were carried out as described (Barker et al., 1993) using serial 
dilutions of the template DNAs to allow identification of any artifactual positives due to 
slight contamination by cells with different chromosomal complements. DNAs were 
prepared by standard methods from YAC and PI clones, obtained from the Baylor Human 
30 Genome Center and Genome Systems respectively, and similarly analyzed. STS primers 
for RNU2 (Genome Database), for 1 A1.3B exons 12 and 19 (Campbell et al., 1994) and 

O 
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TABLE 1 
Sequences of Kev Oligonucleotides 



Primer 


Sequence 


Partner 


Product 


PCR Conditions 


120 2 

M. mm w ■ mm 


CCAGTACCCCAGAOCAT 
CA (SEQ ID N0:3) 






sees at V4 L., oU sees at 
57°C, 90 sees at 72°C) x 30 


120 3 


Tfi a Acnrccc a a a rrr 

TC (SEQ ID N0:4) 


i?n 9 

J zu.z 




see above 


214.3 


TGGATGGAGAACAAGG 
AATC (SEQ ID NO:5) 


42.2 


BRCA1 


GO SeCS at 94°f l 60 «er<: at 

60°C, 165secsat72°C)x6 
followed by (30 sees at 94°C, 
60 sees at 55°C, 170 sees at 
72°C) x 30 


42.2 


TGAACTTCTCCAAACCC 
TC (SEQ ID NO:6) 


120.2 


BRCA1 


(30 sees at 94°C, 60 sees at 
58°C, 90 sees at 72°C) x 30 


225.1 


GGGCAGAAGCAACCTGA 
(SEQ ID NO:7) 


225.4 


1A1.3B 


(45 sees at 94°C, 60 sees at 
61 °C, 90 sees at 72°C) x 30 


225.4 


GGAGGGACAGAAAGAG 
CC (SEQ ID NO:8) 


225.1 


1A1.3B 


see above 


42.3 


GGTCAGAATCGCTACCT 
ATTG (SEQ ID NO:9) 








1007.5 


AGCTCGCTGAGACTTCC 
TG (SEQ ID NO: 10) 








214.2 


GAAGTTGTCATTTTATA 
AACCTTT (SEQ ID NO: 1 1 ) 









For PCR primer pairs, the thermal cycling conditions, and the specificity of the 
product (BRCA1 or 1 A1.3B) are as shown. Taq polymerase and standard reaction buffer 
were from Promega and cycling was performed in a Perkin-Elmer 480 or Techne PHC-3 
thermal cycler. 



SUBSTITUTE SHEET (RULE 26) 
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for BRCA1 exon il (5 , -AGTGATCCTCATGAGGCTTT-3 , (SEQ ID NO:12) and 5'- 
TTAACTGTCTGTACAGGCTTGAT-3' (SEQ ID NO: 13), designed using information in 
Genbank entry U14680 (shown in the Sequence Listing as SEQ ID NO: 14)) were used for 
the YAC and PI analyses, in addition to primer pairs specific for each promoter. 
5 The present invention is described by reference to the following Examples, which 

are offered by way of illustration and are not intended to limit the invention in any 
manner. Standard techniques well known in the art or the techniques specifically 
described were utilized. 

10 EXAMPLE 1 

Detection of a Duplication Involving the BRCA1 Promoter Region 



PCR primers 120.3 and 1202 (Table 1) were designed for amplification of the 
BRCA1 promoter region from information presented by Brown et al. (1994). This 

15 published sequence is apparently an improper fusion of lA1.3B-specific sequences with 
BRCAl-specific sequences, probably due to the very close similarity between the two 
promoter regions that caused a clone rearrangement or a false contig assignment. This 
initial primer pair included one (120.3) that corresponds to a region of near-identity in the 
1A1.3B gene with its cognate and a second (120.2) that is BRCAl-specific. RE-SSCP 

20 analysis of the 1300 bp PCR product revealed two polymorphic sites. However, the 
restriction fragment patterns observed did not agree completely with those predicted from 
the promoter sequence of Brown et al. (1994). DNA sequencing to identify the 
polymorphic sites confirmed this apparent distinction. BLAST searches showed that 
U37574 contributed by Xu et al. (1995), was essentially identical to the segment in which 

25 the polymorphisms occurred U37574 is a 3.8 kb genomic PstI fragment that includes the 
BRCA1 promoter and the alternative 5' BRCA1 exons la and lb as well as upstream 
sequences (Xu et al., 1995). 

By testing various pairs of additional primers designed from all the available 
sequence information, we identified primer pairs and cycling conditions (Table 1) that 

30 consistently amplified, from human genomic DNA, segments with distinct sequences that 
were essentially identical to portions of either the Brown et al. (1994) sequence or the 
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U37574 sequence. Primers 42.2 and 214.3 (Table 1) were used to amplify a 2600 base 
segment that extends from 1100 bp upstream of the first BRCA1 exon through BRCA1 
exon 2. Sequencing of the 42.2 plus 214.3 PCR product obtained with a genomic DNA 
sample carrying the BRCA1 185delAG mutation in exon 2 (Simard et al., 1994; 
5 Struewing et al., 1995) revealed a simple sequence pattern heterozygous for 185delAG, 
demonstrating that this genomic PCR product represents BRCA1 . Sequencing of portions 
of the upstream region of this same 42.2 plus 214.3 product were in essentially complete 
agreement with the U37574 sequence, confirming the correspondence of U37574 to 
BRCA1. In contrast, primers 225.1 plus 225.4 (Table 1) amplified a fragment with a 

1 0 sequence corresponding to that of Brown et al. (1994) throughout nearly its entire length, 
with divergence at a position close to the 225.4 primer indicating the site of the apparent 
artifactual fusion. Comparison of the 225.1 plus 225.4 sequence to the BRCA1 -specific 
sequence revealed 6% non-identity of corresponding bases as well as 6 short 
insertion/deletion differences, demonstrating the existence of two similar but distinct 

1 5 genomic segments. 

EXAMPLE 2 

Characterization of the Location wd Structure of the Duplication 



20 The primer pairs specific for amplification of each of the distinct promoter 

segments were used to determine their genomic localization using human-rodent hybrid 
cell lines (vanTuinen et al., 1987) containing known portions of chromosome 17 (Figure 
1 A). In each case, a positive PCR reaction was observed with template DNA from hybrid 
ND-1 but not from MH-41, placing both promoter complexes in a region between hybrid 

25 breakpoints at 17q21.1 and 17q23.1, in agreement with the well-established localization 
of BRCA1 at 17q21 (data not shown). Analyses of DNA from additional hybrids with 
different chromosome 17 breakpoints, as well as DNA from hybrid MH-22, containing 
human chromosome 17 as its only human complement, were consistent with a unique 
localization^ 17q21. STS analysis of genomic YAC and PI clones included in physical 

30 maps of the BRCA1 region (Albertsen et al., 1994; Neuhausen et al., 1994) is presented in 
Figure IB. The key observations are that PI clone 746B4 contains a segment spanning 
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BRCA1 exon 1 1 and the BRCA1 promoter, but not including the 1A1.3B promoter, while 
YAC clone 173B7 includes the 1A1.3B promoter and 1A1.3B exons 12 and 19, but not 
the BRCA1 promoter. These and other data summarized in Figure IB are most consistent 
with the view that the two distinct promoters are corresponding elements of a direct 
5 duplication at 17q21, with the gene loci oriented with respect to each other and the 
chromosome as depicted in Figure 1 A. 

EXAMPLE 3 

Isolation and Refined Analysis of Genomic 
10 Clones Containing the Duplicated Promoter Regions 

As described above, genomic clones containing the BRCA1 or 1A1.3B promoters 
and adjacent sequences were isolated from a rearrangement-resistant lambda library. 
PCR analysis of the complete library DNA indicated the presence of both promoter 

15 segments. All isolated clones hybridized strongly to the 225.1 plus 225.4 PCR product 
used as probe, but revealed one of two distinct EcoRI restriction patterns. Clone 10A and 
5 similar isolates contained two EcoRI fragments of 7.0 and 9.2 kb. Clone 16C contained 
EcoRI fragments of 7.1, 2.7, 2.5, 1.5 and 0.35 kb. Plasmid DNAs containing individual 
EcoRI fragments of 10A and 16C were probed with oligonucleotides 42.2, 42.3, 225.4, 

20 1007.5 and 214.2 (Figure 2, Table 1) to establish the fragment maps shown in Figure 1A. 
The oligonucleotide hybridization analysis also confirmed that clone 16C includes the 5' 
portion of BRCA1. Oligonucleotide 1007.5, corresponding to well-established BRCA1 5' 
cDNA sequence (Figure 2, Table 1), showed strong hybridization to the 16C 7.1 kb 
EcoRI fragment but weak hybridization to the 10A 9.2 kb EcoRI fragment, detectable 

25 only with long autoradiographic exposure. Sequence analysis of the termini of the 1.5 
and 7.1 kb EcoRI fragments of 16C showed that these do not include any of the CH40 
vector sequences and are therefore "natural" EcoRI fragments. In contrast, each of the 
EcoRI fragment subclones of 10 A includes an "artificial" CH40 EcoRI end, showing that 
the corresponding genomic EcoRI fragments must be longer than 7.0 and 9.2 kb (Figure 

30 1A). The"difFerent EcoRI site patterns of these two clones show that the segments 
containing the distinct promoter complexes are not overlapping or interdigitatecL 
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DNA sequencing of portions of the 1.5 and 7.1 kb EcoRI fragments of clone 16C 
as well as of the BRCA1 -specific PCR products described earlier was in essentially 
complete agreement with the U37574 sequence. This identity was confirmed for 
approximately 75% of the length of U37574, including all of the region upstream of 
5 BRCA1 exon la, and all of BRCA1 exons lb and 2. In contrast, the 10A sequence was 
found to include the 1A1.3B exon la and lb sequences as reported by Brown et al. 
(1994). The region corresponding to the location of BRCA1 is also quite distinct 

Figure 3 shows a dot plot of the U37574 sequence vs. a corresponding segment of 
clone 10A. This region includes 1A1.3B exons la and lb and BRCA1 exons la, lb and 2 

10 and their cognates. The strong diagonal elements of the plot indicate a high degree of 
sequence similarity across this entire length. There are three significant gaps in this 
similarity. Gapl (Figure 3) is due to a 340 bp insertion in LBRCA1 just at the beginning 
of the sequence that corresponds to BRCA1 exon la (Figure 2). It is unclear whether this 
insertion may be considered part of LBRCA1 exon la. It does not include homology to 

1 5 any highly repeated human sequence. 

Gap2 is due to an additional 61 bp of sequence within the segment of LBRCA1 
that corresponds to BRCA1 exon lb. As indicated in Figures 2 and 3, nearly all of the 
LBRCA1 "exon lb-like" sequence and a significant part of the BRCA1 exon lb sequence 
correspond to a region of homology with the Alu repeat element The additional 61 bases 

20 that are present in the LBRCA1 gene represent that part of the Alu element that is missing 
from BRCA1 exon lb. This difference strongly suggests that the Alu element at this 
position existed prior to the duplication event and that part of this Alu was lost in the 
further evolution of BRCA1 exon lb. The finding that exon lb is derived from an Alu 
element is an example of a phenomenon already described for a variety of other known 

25 genes (Makalowski et al., 1994; Baban et al., 1996). Since the Alu element is found only 
in primates, the proposed duplication almost certainly occurred after the genomic 
dispersion of this element in the primate genome. The function of exon lb in BRCA1 is 
also very likely to be unique to primates. 

Gap3 (Figure 3) corresponds to a region upstream of BRCA1 exon 2. At this 

30 position, LBRCA1 includes a complete Alu element in opposite orientation to the exon lb 
Alu element This Alu is missing from the BRCA1 gene. However, there are about 60 
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non-contiguous basepairs in BRCA1 at this position that are not present in LBRCA1. A 
fourth notable feature in Figure 3, the outlying diagonal element, is due to the presence of 
an additional Alu repeat, downstream of BRCA1 exon 2, and in the same orientation as 
the exon lb element The known sequence of the LBRCA1 segment ends within this 
5 repeat 

Differences between the 1AL3B exons and L1A1.3B are small and only evident 
upon closer inspection of Figure 3. The sequences of L 1 Al .3B that correspond to exons 
la and lb of 1A1.3B both reveal short deletions totaling 24 and 10 basepairs respectively. 
Since neither of the 1A13B la or lb exons encodes any known translation product, the 
1 0 significance of these differences is not apparent 

EXAMPLE 4 

Implications of DNA Structure for Expression of LBRCA1 and L1A1.3B 

15 The fact that each of the gene complexes, L1AL3B/BRCA1 and 

1A1.3B/LBRCA1, contains one gene with well-established transcriptional activity shows 
that both of the newly identified promoters are active. The possibility of functional 
transcription of the LBRCA1 and L1AL3B genes is supported by the overall structure 
and sequence similarity of the promoter complex regions (Figures 2 and 3) as well as the 

20 conservation of splicing sequences for the presumptive exons of both LBRCA1 and 
L1A1.3B* The BRCA1 start site and coding frame in exon 2 are not conserved in 
LBRCA1, however there is a potential ATG start site close to the end of the sequence that 
is similar to exon 2 (Figure 2). 

A feature of the region of Alu similarity in BRCA exon lb and the corresponding 

25 segment of LBRCA1 (Figure 2) that is likely to be significant for expression is the 
presence of an Alu-related estrogen responsive element (ERE) as defined by Norris et al. 
(1995), that functions as an estrogen-dependent transcription enhancer. By comparison of 
two functionally defined ERE elements, one of them derived from an unknown location 
within the- 5' 50 kb of BRCA1, these authors proposed the consensus sequence 

30 GGTCA(N) 3 TGGTC(N)9TGACC (SEQ ID NO: 1 5). This sequence was found within Alu 
elements, aligned in reverse orientation with respect to the "sense" Alu orientation 
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(indicated by the arrows in Figure 2). Corresponding segments of BRCA1 exon lb and 
LBRCA1 include a perfect match to this consensus, with the expected orientation relative 
to the Alu. Flanking sequences distinguish both of these elements from that reported by 
Norris et al. (1995), showing that the 5' BRCA1 region contains at least two EREs. No 
5 other consensus EREs are present in the segments shown in Figure 2. 

EXAMPLE 5 

Using Polymorphisms to Track BRCA1 Chromosomal Rearrangements 



10 Several genes in the human genome that are duplicated have now been identified. 

These gene duplications are often a result of unequal crossing over events that have 
occurred during the evolutionary history of the human species. Often both elements of 
the duplicated segment have subsequently evolved functions that are essential for normal 
development and health. Events that occur during the growth of cells of a single human 

15 individual can result in unequal crossing over that reverses the effect of the evolutionary 
event, destroying one or both of these functions. Another possible outcome of unequal 
crossing over is further expansion of the duplicated region, which may also result in 
destruction of functional gene arrangements. 

Duplications and deletions of specific genes have been associated with disease 

20 states. Unequal exchange within the PMP22 gene may result in Charcot-Marie-Tooth 
disease Type I if there is a duplication (Pentao et al., 1992) or it may result in hereditary 
neuropathy with liability to pressure palsies if there is a deletion (Chance et al., 1993). 
Inversions caused by recombination in the Factor Vm gene are responsible for almost 
50% of the cases of hemophilia A (Nay lor et al., 1993; Lakich et al., 1993; Naylor et al., 

25 1995). The iduronate-2-sulphatase (IDS) gene is duplicated in the genome and 
recombination between the IDS gene and its second locus (IDS-2) is the cause of Hunter 
Syndrome in 13% of patients with this disease (Bondeson et al., 1995). Unequal 
crossing-over between lip-hydroxylase and aldosterone synthase leading to partial 
duplication-of both genes with the 5* regulatory region of 1 lp-hydroxylase fused to the 

30 coding sequence of aldosterone synthase causes glucocorticoid-remediable aldosteronism 
(Lifton et al., 1992). 
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The region containing the BRCA1 and 1A1.3B genes is partially duplicated in the 
human genome and this duplication enhances the chance that chromosomal rearrangement 
will occur via unequal crossing over between the homologous elements of the duplicated 
structures. This could easily result in inactivation of these genes or to other pathological 
5 gene arrangements. Determining the presence of such a mutation can be more difficult 
than finding a point mutation and may be missed. If screening for mutations is limited to 
sequencing the complete coding regions of these genes, a recombination that occurred 
within an intron will likely not be seen. This would be the result if the gene is sequenced 
by first amplifying the gene via PCR using sets of primers that amplify only the exons. 

10 The results of such screening could well show that all of the exons are present within the 
genome and may find no mutations such as point mutations, insertions, deletions, etc. 
Nevertheless, if a recombination has occurred within the gene resulting in an unequal 
crossing over, at least one of the two chromosomes will in fact not have an intact gene 
and the gene on that chromosome will be inactive. 

15 Unequal crossing over may occur within somatic tissue or may occur in the 

gennline. If such occurs within a cell in somatic tissue then that cell may be the start of a 
tumor. If the rearrangement occurs within the gennline then one of the recombined 
chromosomes may be passed on to progeny. These descendants may receive a wild-type 
gene from an unaffected parent and the recombined chromosome from the affected parent 

20 Loss of the active gene (loss of heterozygosity) within a cell in such a person will likely 
cause that cell to be the start of a tumor. Clearly, a person carrying a chromosome in 
which there has been a genetic rearrangement affecting the BRCA1 gene thereby 
inactivating it is at as much risk of developing breast or ovarian cancer as is a person with 
a point mutation or deletion or insertion within a single chromosome which is known to 

25 be associated with these cancers. Methods for detecting these rearrangements will be 
very useful, certainly just as useful as methods for detecting the point mutations, deletions 
and insertions within BRCA1 that are known to be associated with breast and ovarian 
cancer. 

One- method of tracking chromosomal rearrangements is to look at polymorphisms 
30 that occur within the genes. Several polymorphisms are now known for BRCA1. Two 
new polymorphisms are disclosed here. These two polymorphisms occur in the BRCAl 
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promoter region (nucleotides 612 and 980-982 of U37574 (SEQ ID NO:l)). Nucleotide 
612 may be either C or T and nucleotides 980-982 may be either AAC (as shown in 
U37574) or AACAAC. These two polymorphisms show a high level of linkage 
disequilibrium. Because of this high linkage disequilibrium there are only two 
5 "genotypes" so far as looking at the combination of these two polymorphisms, i.e., a 
chromosome will have either C/AAC or it will have T/AACAAC. The C/AAC haplotype 
has a frequency of 0.65. 

These polymorphisms may be used to track recombination within somatic tissue. 
If both chromosomes have the same "genotype", i.e., both are C/AAC or both are 

10 T/AACAAC then it will be uninformative to use these polymorphisms to study 
recombination within BRCA1. If the person is heterozygous for these polymorphisms, 
i.e., one chromosome contains C/AAC and the other chromosome contains T/AACAAC, 
then use of these polymorphisms will be perfectly informative in assaying for 
recombination within BRCA1. To perform such an assay, germline tissue is assayed for 

15 the presence of these two polymorphisms. If the person is heterozygous then both 
genotypes will be seen. Somatic tissue is also analyzed. If the somatic tissue shows only 
one of the two genotypes then clearly the chromosome carrying the other genotype has 
been deleted for the region containing at least that portion of BRCA1 containing the 
polymorphic site. Such a result would indicate a high probability that the suspect tissue is 

20 indeed cancerous. This would be strengthened by the knowledge that the person contains 
a mutation known to be associated with breast and ovarian cancer. This test confirms the 
loss of heterozygosity which may lead to cancer when the wild-type gene is lost Note 
that if the person were homozygous then this test would not be applicable since only one 
genotype is present and this genotype will be seen regardless of whether there are two 

25 copies or one copy of the polymorphism present If a person were hemizygous due to 
inheriting a BRCA1 gene which was partially deleted then the above assay would work in 
that loss of the wild-type copy of the gene would result in the presence of zero copies of 
the polymorphisms and this would be noted by an inability to amplify the gene region. 
However, «ene must be able to know that the person was hemizygous rather than 

30 homozygous to utilize such an assay. 
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Such an assay is not limited to the two polymorphisms noted above but may be 
used with other polymorphisms within the BRCA1/LBRCA1 region. Several 
polymorphisms have been published for BRCA1. Two new polymorphisms within 
LBRCA1 have been discovered and are presented here. One is at base 1723 of SEQ ID 

5 NO:2 and the other at base 2182 of SEQ ID NO:2. The base at both of these positions 
may be either G or A. 

The above method is suitable for tracking recombination within somatic tissue but 
not in the germ line. If a person has inherited one wild-type gene and one gene with a 
deletion of the chromosomal region between LBRCA1 and BRCA1, the person will be 

0 hemizygous for the noted polymorphisms if the recombination has deleted them from one 
chromosome. This hemizygous person will appear homozygous for either the C/AAC or 
the T/AACAAC polymorphism if those are the polymorphisms being examined. If a 
person has inherited one wild-type gene and one gene with a duplication of the 
chromosomal region between LBRCA1 and BRCA1 , the person may have three copies of 

5 the gene region containing the polymorphisms. Such a person could be either 
homozygous or heterozygous for the polymorphisms. Regardless of whether a person 
with a germline rearrangement has one copy or three copies of the gene region, if the 
recombination occurred within introns, simply sequencing exons will not discover this 
rearrangement Nevertheless, the copy number of the gene region containing the 

0 polymorphism may be determined by methods such as quantitative PCR (see, e.g., 
Volkenandt et al., 1992; Filliland et al., 1990; Pastore et al., 1996) or fluorescent in situ 
hybridization (FISH). FISH analysis would easily discern a deletion of the region. 
Whereas many, possibly most, genes in the human genome are not duplicated in part or 
whole and genetic recombination within the gene would be expected to be quite rare, 

5 BRCA1 and its contiguous gene L1A1.3B are partially duplicated (as LBRCA1 and 
1 A1.3B) and this region is therefore much more likely to undergo unequal crossing over 
leading to gene deletion or duplication. Analysis of such is therefore more important with 
BRCA1 than it will be with genes which are not duplicated. The presence of the Alu 
repeat within the BRCA1-1A1.3B genes makes crossing over an even more likely event 

) than for genes without such a repeat 
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Analysis of the copy number of BRCA1 need not be limited to use of the 
polymorphic regions. Any region may be used. The importance of the analysis is that if 
the test results indicate the presence of only a single copy then this is equivalent to having 
a mutation known to be associated with breast or ovarian cancer since there is only a 
5 single (at most) wild-type copy of the gene present If the test results indicate the 
presence of three copies of the gene region then again this is cause for concern because it 
will indicate the presence of a duplication of the gene region with the possibility that the 
duplication is a result of unequal crossing over which has inactivated the BRCA1 gene. If 
so then again there would be at most one copy of wild-type BRCA1 present Clearly the 
10 knowledge of copy number of the BRCA1 gene is as important as knowing the presence 
of point mutations. 



It will be appreciated that the methods and compositions of the instant invention 
can be incorporated in the form of a variety of embodiments, only a few of which are 
15 disclosed herein. It will be apparent to the artisan that other embodiments exist and do 
not depart from the spirit of the invention. Thus, the described embodiments are 
illustrative and should not be construed as restrictive. 
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CGTATTCTGA GAGGCTGCTG CTTAGCGGTA GCCCCTTGGT TTCCGTGGCA ACGGAAAAGC 
GCGGGAATTA CAGATAAATT AAAACTGCGA CTGCGCGGCG TGAGCTCGCT GAGACTTCCT 
GGACGGGGGA CAGGCTGTGG GGTTTCTCAG ATAACTGGGC CCCTGCGCTC AGGAGGCCTT 
CACCCTCTGC TCTGGGTAAA GGTAGTAGAG TCCCGGGAAA GGGACAGGGG GCCCAAGTGA 
TGCTCTGGGG TACTGGCGTG GGAGAGTGGA TTTCCGAAGC TGACAGATGG GTATTCTTTG 
ACGGGGGGTA GGGGCGGAAC CTGAGAGGCG TAAGGCGTTG TGAACCCTGG GGAGGGGGGC 
AGTTTGTAGG TCGCGAGGGA AGCGCTGAGG ATCAGGAAGG GGGCACTGAG TGTCCGTGGG 
GGAATCCTCG TGATAGGAAC TGGAATATGC CTTGAGGGGG ACACTATGTC TTTAAAAACG 
TCGGCTGGTC ATGAGGTCAG GAGTTCCAGA CCAGCCTGAC CAACGTNGGT GAAACTCCGT 
CTCTACTAAA AATACAAAAA TTAGCCGGGC GTGGTGCCGC TCCAGCTACT CAGGAGGCTG 
AGGCAGGAGA ATCGCTAGAA CCCGGGAGGC GGAGGTTGCA GTGAGCCGAG ATCGCGCCAT 
TGCACTCCAG CCTGGGCGAC AGAGCGAGAC TGTCTCAAAA CAAAACAAAA CAAAACAAAA 
CAAAAAACAC CGGCTGGTAT GTATGAGAGG ATGGGACCTT GTGGAAGAAG AGGTGCCAGG 
AATATGTCTG GGAAGGGGAG GAGACAGGAT TTTGTGGGAG GGAGAACTTA AGAACTGGAT 
CCATTTGCGC CATTGAGAAA GCGCAAGAGG GAAGTAGAGG AGCGTCAGTA GTAACAGATG 
CTGCCGGCAG GGATGTGCTT GAGGAGGATC CAGAGATGAG AGCAGGTCAC TGGGAAAGGT 
TAGGGGCGGG GAGGCCTTGA TTGGTGTTGG TTTGGTCGTT GTTGATTTTG GTTTTATGCA 
AGGGAAAGAA AACAACCAGA AACATTGGAG AAAGCTAAGG CTACCACCAC CTACCCGGTC 
AGTCACTCCT CTGTAGCTTT CTCTTTCTTG GAGAAAGGAA AAGACCCAAG GGGTTGGCAG 
CAATATGTGA AAAAATTCAG AATTTATGTT GTCTAATTAC AAAAAGCAAC TTCTAGAATC 
TTTAAAAATA TAGGACGTTG TCATTAGTTC TTTGGTTTGT ATTATTCTAA AACCTTCCAA 
ATCTTAAATT TACTTTATTT TAAAATGATA AAATGAAGTT GTCATTTTAT AAACCTTTTA 
AAAAGATATA TATATATGTT TTTCTAATGT GTTAAAGTTC ATTGGAACAG AAAGAAATGG 
ATTTATCTGC TCTTCGCGTT GAAGAAGTAC AAAATGTCAT TAATGCTATG CAGAAAATCT 
TAGAGTGTCC CATCTGGTAA GTCAGCACAA GAGTGTATTA ATTTGGGATT CCTATGATTA 
TCTCCTATGC AAATGAACAG AATTGACCTT ACATACTAGG GAAGAAAAGA CATGTCTAGT 
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AAGATTAGGC TATTGTAATT GCTGATTTTC TTAACTGAAG AACTTTAAAA ATATAGAAAA 3120 

TGATTCCTTG TTCTCCATCC ACTCTGCCTC TCCCACTCCT CTCCTTTTCA ACACAAATCC 3180 

TGTGGTCCGG GAAAGACAGG GACTCTGTCT TGATTGGTTC TGCACTGGGG CAGGAATCTA 3240 

GTTTAGATTA ACTGGCATTT TGGCTTTTCT TCCAGCTCTA AAACAAGCTC CATCACTTGA 3300 
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AATGGCAAAA TAAAATCATG GATGAGGCCG AGGGCGGTGG CTTATGCCTG TAATCCCAGC 3360 

ACTTTGGGAG GCCAAGGTGG TAGGATCACG AGGTCAGGAG ATCGAGACCA TCCTGGCCAA 3420 

CATGGTGAAA CCCCCTCTCC ACTAAAAATA CAAAAATTAG CTGGGCGTAG TGGCATGTGC 3480 

CTGTAATCCC AGCTACTCAG GAGGCTGAGG CAGGAGAATC ACTTGAACCA GGAGGCAGAT 3540 

GTTGCTGTGA GCCAATATGG CACCACTGAA CTCCAGCGAC AGAGCTAAAC TCCATCTCAA 3600 

AAAAAAAAAA AAAAAAAAAN AAACATGGAT GATCGGTGTC GTTGAGAGGA TAGGTATTTG 3660 

GAAGAACCTT TGTTTGAAAC TGGCTCTGTA CATACAATGA AATTACATAC TTATTTACAT 3720 

ACAATGAAAT GCAGAGGTTT TTTTTTTATA TAGGATCTCT GTCGAGAGGC TGGAGTGCAG 3780 

TGGTGCTATC ACAGCTCA 3798 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3800 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 
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GGGAGTCGTG CCTTCCATCA GTAGAAGCCG GATGTTCTGA CCCACAGACT CTCCAACTCT 60 

CCGGCGCTTC TCGCCAACTC GGTCCCTCTG AACATGAAGG GCTCTCTCAT CCTGTCACTA 120 

AAAAGATTAG CTGTCCCGAA ACACGGAAAA AGTCGCCCCT CTTCTTTGCA GGATTCCTCC 180 

CTTGAACTTC CCCAAACCCT CTTAGCGTGA CGTGACCCCA CCCCTAGGTA ACCGCAGCTG 240 

CTTCCTTACC AGCTTCCCGC CCCCGGGGGG CGCCTGCCGG AGGCCAATGC AAGGACCGTC 300 

CGCTACCGGC TCTGCCGCTA TCCCTGTGGG GTGAATCTAA CATGGCGGAT AAAGACAGTA 360 

ACTAGTCCCC TGTTTCTCCG AGTGTTCGCC AAGATGATTG GCTCTCACCA CTTGTCCCTC 420 

AAAACGACCA CGCCATTGAT TGGTGGAGAT TGCGTCGATG GGGCGGGGCA GAAGCAACCT 480 

GAACCCGAAC AACAATAACA AACATTGAGG CTGAGGGGCG GAACTAGGAG TGCGCAGATG 540 

TGGGCCAGAG CGGATTTCCC CTTCCCCAGG CAAATTCGGC GCCCACTGCG TCCCCGCAGG 600 

3"? 



WO 98/23779 PCT/US9' 

CCACTGACCT TAGAGGACTA CTTGCCCGAG ACTCGTGGGG CTGGATGGGA ATCGTAGTCT 
TCCTAGGAGT TGTAGGTATC TTTTTTTGGC CTAGTCTCTG CTCTCAAGAT AGGAGAACAT 
AACAACACTC CAATCCATTA CTGTTGACAT GTATAAGCCC GCGGAGGTCT CCAATCTATC 
CACTGGATTT CCGTGAGAAT TGTGCCCGCT TTGGTATTGG ATGTTCCTCT CCATAAGACT 
ACAGTTTCCA AGGAACAGTG TGGCCAAGGC CTTTCGTTCC GCAATGCATG TTGGAAATAG 
TAGTTCTTTC CCTCCACCTC CCAACAATCC TTTTATTTAC CTAAACTGGA GACCTCCATT 
AGGGCGGAAA GAGTGGGGTA ATGGGACCTC TTCTTAAGAC TGCTTTGGAC ACTATCTTAC 
GCTGATATTC AGGCCTCAGG TGGCGATTCT GACCTTGGTA CAGCAATTAC TGTGACGTAA 
TAAGCCGCAA CTGGAAGCGT AGAGGCGAGA GGGCGGGCGC TTTACGGCGA ACTCAGGTAG 
AATTCTTCCT TTTCCGTCTC TTTCTTTTTA TGTCACCAGG GGAGGACTGG GTGGCCAACC 
CAGAGCCCCG AGAGATGCTA GGCTCTTTCT GTCCCGCCCT TCCTCTGACT GTGTCTTGAT 
TTCCTATTCT GAGAGGCTAT TGCTCAGCGG TTTCCGTGGC AACAGTAAAG CGTGGGAATT 
ACAGATAAAT TAAAACTGTG GAACCCCTTT CCTCGGCTGC CGCCAAGGTG TTCGGTCCTT 
CCGAGGAAGC TAAGGCCGCG TTGGGGTGAG ACCCTCACTT CATCCGGTGA GTAGCACCGC 
GTCCGGCAGC CCCAGCCCCA CACTCGCCCG CGCTATGGCC TCCGTCTCCC AGCTTGCCTG 
CATCTACTCT GCCCTCATTC TGCAGGACTA TGAGGTGACC TTTACGGAGG ATAAGATCAA 
TGCCCTTATT AAAGCAGCCA GTGTAAATAT TGAAACTTTT TGGCCTGGCT TGTTTGCAAA 
GGTCCTGGCC AACGTCAACA TTGGGAGCCA CATCTGCAAT GTAGAGGGGG GGAAAAAAAC 
GTGACTGCGC GTCGTGAGCT CGCTGAGACG TTCTGGACGG GGGACAGGCC GTGGGGTTTC 
TCAGATAACT GGGCCCCTGG GCTCAGGAGG CCTGCACCCT CTGCTCTGGG TTAAGGTAGA 
AGAGCCCCGG GAAAGGGACA GGGGCCCAAG GGATGCTCCG GGGGACGGGC GGGGGAAAGT 
GAATTTCCGA AGCTAGGCAG ATGGGTATTC TTATGCGAGG GGCGGGGGCG GAACCTGAGA 
GGCATAAGGC GTTGTGAACC CCCCGGGGAA GGGGGCAGTT TGTAGGTCTC GAGGGAAGCA 
CTAAGGATCA GGTTGGGGGC ACAGTGTGTC CGAGGAGGAA TCCTCCTGAT AGGAACTGGA 
ATGTGCCTTG AAGGGGACAC CATGTGTATA AGAACATCAG CTGGTCGCCG GGGATGGTGG 
CTTACGCCTG TATTCCTAGC ACTTTGGGAG GCCAAGGCGG ATGGATCACG AGGTCAGGAG 
TTCGAGACCA GCCTGACCAT CGTGGTGAAA CCCCGTCTCT ACTAAAAATA CAAAAATTAG 
CCGGGCGTGG TGGCGCGCGC CAGCTACTCA GGAGCTGAGG CAGGAGAATC GCTTGAACCC 
AGGAGGCGGA GGTTGCAGTG AGCCGAGATC GCGCCATTGC ACTCCAGCCT GGGTGGCAGA 
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ACGACACTCC GTCTCAAAAA CAAACAAAGA AATAAACACC GGCTGGTATA TATGAGAAGA 
TGGGCCCTTG CGGAAGAAGA AGTGCCAGGA ATATGTCTGG GAAGGGGAGG AGACAGGATT 
TTGTGGGAGG GAGAACTTAA GAACTGGATC CATTTGTGCT ATTGAGAAAG CGCAAGAGGG 
AAGTAGAGGA GCGTCAGTAG TAACAGATGC TGCCGGCAGG GATGTGCTTG AGGGGGATCC 
TGAGATGAGA GTGGGTCGCT GGGAAAGGCT AGGGGCAGGG AGGCCTTGAT TGGTGTTGGT 
TTGGTCGTTG TTGATTTTGG TTTTATGCAA GAAAAAGAAA ACAGCCAGAA GCATTGGAGA 
AAGCTCACCA CTTACCCGGT CAGTCACTCC CCTGTAGCTT TCTCTTTCTT GGAGAAAGGA 
AAAGACCCAA AGGGTTGGAA GCAATATGTG AAAAAATACA GAATTTATAT TGTCTAATTA 
CAAAAAGCAA CTTCTAGAAC CTTTAAAGGA TTTTGTATTA TTCTAAAACC TTCCAAATCT 
TAAATTTACC TTATTTTATT TTATTTATTT NTGAGACGGA GCTTCGCTCT TGTTGCCCAG 
GCTGGAGTGT AATCGGCGTG ATTTGGGCTC ACCGCAACCT CTGACTCGTG GGTTCAAGCG 
ATTCTCCTGC CTCAGCTCCC GAGTAGCTGG GATTACACGC ATGCACCACC ATGCCTGGCT 
CATTTTTTTG TATTTTTAGT AGAAACGAGG TTTCTCCGTA TTGGTCAGGC TGGTCTTGAA 
CTCCCGACCT CAGGTCATCC GCCCGCCTCG GCCTCCCTAA GTGCTGTGAT TGACAGGCGT 
GAGCCACCGA CGCCCAGCCC AATTTACCTT ATTTTAAAAT GATAAAATGA AGTTGTCATT 
TTTCTAAACC TTTTTAAAAG ATACATGTTT TTCTAATGTG TTAAAGTTCA TTGGAACAGA 
AAGAGATAGA TTTATCTGCT GTTTGCGTTG AAGAAGTACA AAATGTCCTT AATGCTATGC 
AGAAAATCTT ACAGTGTCCA ATCTGGTAAG TCACCAGAAG AGGGTATTAA TTTGGGATTC 
CTATATGATT ATCTCCTATG CAAATGAACA GAATTGACCT TACATAGAAG GGAGGAAAAG 
ACATGTCTAA TAAGATTAGG CTATTGTAAT TGCTGATTTT CTTAACTGAA GAACTTTAAA 
AGTATAGAAA ATGAATCCTT GTTCTCCATC CACTCTGCCT CTCCCACTCC TCTCCTCTTC 
AACACAAATC CTGTGGTCCC TGAAAGACAG GGACCCTGTC TTGATTGGTT CTGCACTGGG 
GCAGGAATCT AGTTTAGATT AACTGGCATT TTGGTTTTNT TCTAGCTCTA AAACCAGCTC 
CATCACTTGA AATGGCAAAA TAAATCATGA ATGAGGCCGG GGGCTGTGGC TCACACCTGT 
AATCCCAGCA CTCTGGGGGG 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "PCR primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
CCAGTACCCC AGAGCATCA 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "PCR primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
TGAACTTCCC CAAACCCTC 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc • "PCR primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
TGGATGGAGA ACAAGGAATC 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "PCR primer" 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
TGAACTTCTC CAAACCCTC 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "PCR primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GGGCAGAAGC AACCTGA 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "PCR primer"' 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 
GGAGGGACAG AAAGAGCC 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "PCR primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9 
GGTCAGAATC GCTACCTATT G 



WO 98/23779 PCT/US97/21358 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 19 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "PCR primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
AGCTCGCTGA GACTTCCTG 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc =» "PCR primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GAAGTTGTCA TTTTATAAAC CTTT 24 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "PCR primer" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
AGTGATCCTC ATGAGGCTTT 20 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "PCR primer" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
TTAACTGTCT GTACAGGCTT GAT 2 2 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5711 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 

AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 60 

CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 120 

TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 180 

TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGACC 240 

ACATATTTTG CAAATTTTGC ATGCTGAAAC TTCTCAACCA GAAGAAAGGG CCTTCACAGT 300 

GTCCTTTATG TAAGAATGAT ATAACCAAAA GGAGCCTACA AGAAAGTACG AGATTTAGTC 360 

AACTTGTTGA AGAGCTATTG AAAATCATTT GTGCTTTTCA GCTTGACACA GGTTTGGAGT 420 

ATGCAAACAG CTATAATTTT GCAAAAAAGG AAAATAACTC TCCTGAACAT CTAAAAGATG 480 

AAGTTTCTAT CATCCAAAGT ATGGGCTACA GAAACCGTGC CAAAAGACTT CTACAGAGTG 540 

AACCCGAAAA TCCTTCCTTG CAGGAAACCA GTCTCAGTGT CCAACTCTCT AACCTTGGAA 600 

CTGTGAGAAC TCTGAGGACA AAGCAGCGGA TACAACCTCA AAAGACGTCT GTCTACATTG 660 

31 
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AATTGGGATC TGATTCTTCT GAAGATACCG TTAATAAGGC AACTTATTGC AGTGTGGGAG 
ATCAAGAATT GTTACAAATC ACCCCTCAAG GAACCAGGGA TGAAATCAGT TTGGATTCTG 
CAAAAAAGGC TGCTTGTGAA TTTTCTGAGA CGGATGTAAC AAATACTGAA CATCATCAAC 
CCAGTAATAA TGATTTGAAC ACCACTGAGA AGCGTGCAGC TGAGAGGCAT CCAGAAAAGT 
ATCAGGGTAG TTCTGTTTCA AACTTGCATG TGGAGCCATG TGGCACAAAT ACTCATGCCA 
GCTCATTACA GCATGAGAAC AGCAGTTTAT TACTCACTAA AGACAGAATG AATGTAGAAA 
AGGCTGAATT CTGTAATAAA AGCAAACAGC CTGGCTTAGC AAGGAGCCAA CATAACAGAT 
GGGCTGGAAG TAAGGAAACA TGTAATGATA GGCGGACTCC CAGCACAGAA AAAAAGGTAG 
ATCTGAATGC TGATCCCCTG TGTGAGAGAA AAGAATGGAA TAAGCAGAAA CTGCCATGCT 
CAGAGAATCC TAGAGATACT GAAGATGTTC CTTGGATAAC ACTAAATAGC AGCATTCAGA 
AAGTTAATGA GTGGTTTTCC AGAAGTGATG AACTGTTAGG TTCTGATGAC TCACATGATG 
GGGAGTCTGA ATCAAATGCC AAAGTAGCTG ATGTATTGGA CGTTCTAAAT GAGGTAGATG 
AATATTCTGG TTCTTCAGAG AAAATAGACT TACTGGCCAG TGATCCTCAT GAGGCTTTAA 
TATGTAAAAG TGAAAGAGTT CACTCCAAAT CAGTAGAGAG TAATATTGAA GACAAAATAT 
TTGGGAAAAC CTATCGGAAG AAGGCAAGCC TCCCCAACTT AAGCCATGTA ACTGAAAATC 
TAATTATAGG AGCATTTGTT ACTGAGCCAC AGATAATACA AGAGCGTCCC CTCACAAATA 
AATTAAAGCG TAAAAGGAGA CCTACATCAG GCCTTCATCC TGAGGATTTT ATCAAGAAAG 
CAGATTTGGC AGTTCAAAAG ACTCCTGAAA TGATAAATCA GGGAACTAAC CAAACGGAGC 
AGAATGGTCA AGTGATGAAT ATTACTAATA GTGGTCATGA GAATAAAACA AAAGGTGATT 
CTATTCAGAA TGAGAAAAAT CCTAACCCAA TAGAATCACT CGAAAAAGAA TCTGCTTTCA 
AAACGAAAGC TGAACCTATA AGCAGCAGTA TAAGCAATAT GGAACTCGAA TTAAATATCC 
ACAATTCAAA AGCACCTAAA AAGAATAGGC TGAGGAGGAA GTCTTCTACC AGGCATATTC 
ATGCGCTTGA ACTAGTAGTC AGTAGAAATC TAAGCCCACC TAATTGTACT GAATTGCAAA 
TTGATAGTTG TTCTAGCAGT GAAGAGATAA AGAAAAAAAA GTACAACCAA ATGCCAGTCA 
GGCACAGCAG AAACCTACAA CTCATGGAAG GTAAAGAACC TGCAACTGGA GCCAAGAAGA 
GTAACAAGCC AAATGAACAG ACAAGTAAAA GACATGACAG CGATACTTTC CCAGAGCTGA 
AGTTAACAAA TGCACCTGGT TCTTTTACTA AGTGTTCAAA TACCAGTGAA CTTAAAGAAT 
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CTAATAATGC TGAAGACCCC AAAGATCTCA TGTTAAGTGG AGAAAGGGTT TTGCAAACTG 2400 
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AAAGATCTGT AGAGAGTAGC AGTATTTCAT TGGTACCTGG TACTGATTAT GGCACTCAGG 2460 

AAAGTATCTC GTTACTGGAA GTTAGCACTC TAGGGAAGGC AAAAACAGAA CCAAATAAAT 2520 

GTGTGAGTCA GTGTGCAGCA TTTGAAAACC CCAAGGGACT AATTCATGGT TGTTCCAAAG 2580 

ATAATAGAAA TGACACAGAA GGCTTTAAGT ATCCATTGGG ACATGAAGTT AACCACAGTC 2640 

GGGAAACAAG CATAGAAATG GAAGAAAGTG AACTTGATGC TCAGTATTTG CAGAATACAT 2700 

TCAAGGTTTC AAAGCGCCAG TCATTTGCTC CGTTTTCAAA TCCAGGAAAT GCAGAAGAGG 2760 

AATGTGCAAC ATTCTCTGCC CACTCTGGGT CCTTAAAGAA ACAAAGTCCA AAAGTCACTT 2820 

TTGAATGTGA ACAAAAGGAA GAAAATCAAG GAAAGAATGA GTCTAATATC AAGCCTGTAC 2880 

AGACAGTTAA TATCACTGCA GGCTTTCCTG TGGTTGGTCA GAAAGATAAG CCAGTTGATA 2940 

ATGCCAAATG TAGTATCAAA GGAGGCTCTA GGTTTTGTCT ATCATCTCAG TTCAGAGGCA 3000 

ACGAAACTGG ACTCATTACT CCAAATAAAC ATGGACTTTT ACAAAACCCA TATCGTATAC 3060 

CACCACTTTT TCCCATCAAG TCATTTGTTA AAACTAAATG TAAGAAAAAT CTGCTAGAGG 3120 

AAAACTTTGA GGAACATTCA ATGTCACCTG AAAGAGAAAT GGGAAATGAG AACATTCCAA 3180 

GTACAGTGAG CACAATTAGC CGTAATAACA TTAGAGAAAA TGTTTTTAAA GAAGCCAGCT 3240 

CAAGCAATAT TAATGAAGTA GGTTCCAGTA CTAATGAAGT GGGCTCCAGT ATTAATGAAA 3300 

TAGGTTCCAG TGATGAAAAC ATTCAAGCAG AACTAGGTAG AAACAGAGGG CCAAAATTGA 3360 

ATGCTATGCT TAGATTAGGG GTTTTGCAAC CTGAGGTCTA TAAACAAAGT CTTCCTGGAA 3420 

GTAATTGTAA GCATCCTGAA ATAAAAAAGC AAGAATATGA AGAAGTAGTT CAGACTGTTA 3480 

ATACAGATTT CTCTCCATAT CTGATTTCAG ATAACTTAGA ACAGCCTATG GGAAGTAGTC 3540 

ATGCATCTCA GGTTTGTTCT GAGACACCTG ATGACCTGTT AGATGATGGT GAAATAAAGG 3600 

AAGATACTAG TTTTGCTGAA AATGACATTA AGGAAAGTTC TGCTGTTTTT AGCAAAAGCG 3660 

TCCAGAAAGG AGAGCTTAGC AGGAGTCCTA GCCCTTTCAC CCATACACAT TTGGCTCAGG 3720 

GTTACCGAAG AGGGGCCAAG AAATTAGAGT CCTCAGAAGA GAACTTATCT AGTGAGGATG 3780 

AAGAGCTTCC CTGCTTCCAA CACTTGTTAT TTGGTAAAGT AAACAATATA CCTTCTCAGT 3840 

CTACTAGGCA TAGCACCGTT GCTACCGAGT GTCTGTCTAA GAACACAGAG GAGAATTTAT 3900 

TATCATTGAA GAATAGCTTA AATGACTGCA GTAACCAGGT AATATTGGCA AAGGCATCTC 3960 

AGGAACATCA CCTTAGTGAG GAAACAAAAT GTTCTGCTAG CTTGTTTTCT TCACAGTGCA 4020 

GTGAATTGGA AGACTTGACT GCAAATACAA ACACCCAGGA TCCTTTCTTG ATTGGTTCTT 4080 

CCAAACAAAT GAGGCATCAG TCTGAAAGCC AGGGAGTTGG TCTGAGTGAC AAGGAATTGG 4140 
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TTTCAGATGA 


TGAAGAAAGA 


GGAACGGGCT 


TGGAAGAAAA 


TAATCAAGAA 


, GAGCAAAGCA 


4200 


TGGATTCAAA 


CTTAGGTGAA 


GCAGCATCTG 


GGTGTGAGAG 


TGAAACAAGC 


GTCTCTGAAG 


4260 


ACTGCTCAGG 


GCTATCCTCT 


CAGAGTGACA 


TTTTAACCAC 


TCAGCAGAGG 


GATACCATGC 


4320 


AACATAACCT 


GATAAAGCTC 


CAGCAGGAAA 


TGGCTGAACT 


AGAAGCTGTG 


TTAGAACAGC 


4380 


ATGGGAGCCA 


GCCTTCTAAC 


AGCTACCCTT 


CCATCATAAG 


TGACTCTTCT 


GCCCTTGAGG 


4440 


ACCTGCGAAA 


TCCAGAACAA 


AGCACATCAG 


AAAAAGCAGT 


ATTAACTTCA 


CAGAAAAGTA 


4500 


GTGAATACCC 


TATAAGCCAG 


AATCCAGAAG 


GCCTTTCTGC 


TGACAAGTTT 


GAGGTGTCTG 


4560 


CAGATAGTTC 


TACCAGTAAA 


AATAAAGAAC 


CAGGAGTGGA AAGGTCATCC 


CCTTCTAAAT 


4620 


GCCCATCATT 


AGATGATAGG 


TGGTACATGC 


ACAGTTGCTC 


TGGGAGTCTT 


CAGAATAGAA 


4680 


ACTACCCATC 


TCAAGAGGAG 


CTCATTAAGG 


TTGTTGATGT 


GGAGGAGCAA 


CAGCTGGAAG 


4740 


AGTCTGGGCC 


ACACGATTTG 


ACGGAAACAT 


CTTACTTGCC 


AAGGCAAGAT 


CTAGAGGGAA 


4800 


CCCCTTACCT 


GGAATCTGGA 


ATCAGCCTCT 


TCTCTGATGA 


CCCTGAATCT 


GATCCTTCTG 


4860 


AAGACAGAGC 


CCCAGAGTCA 


GCTCGTGTTG 


GCAACATACC 


ATCTTCAACC 


TCTGCATTGA 


4920 


AAGTTCCCCA 


ATTGAAAGTT 


GCAGAATCTG 


CCCAGAGTCC 


AGCTGCTGCT 


CATACTACTG 


4980 


ATACTGCTGG 


GTATAATGCA 


ATGGAAGAAA 


GTGTGAGCAG 


GGAGAAGCCA 


GAATTGACAG 


5040 


CTTCAACAGA AAGGGTCAAC 


AAAAGAATGT 


CCATGGTGGT 


GTCTGGCCTG 


ACCCCAGAAG 

* 


5100 


AATTTATGCT 


CGTGTACAAG 


TTTGCCAGAA 


AACACCACAT 


CACTTTAACT 


AATCTAATTA 


5160 


CTGAAGAGAC 


TACTCATGTT 


GTTATGAAAA 


CAGATGCTGA 


GTTTGTGTGT 


GAACGGACAC 


5220 


TGAAATATTT 


TCTAGGAATT 


GCGGGAGGAA AATGGGTAGT 


TAGCTATTTC 


TGGGTGACCC 


5280 


AGTCTATTAA 


AGAAAGAAAA 


ATGCTGAATG 


AGCATGATTT 


TGAAGTCAGA 


GGAGATGTGG 


5340 


TCAATGGAAG 


AAACCACCAA 


GGTCCAAAGC 


GAGCAAGAGA ATCCCAGGAC 


AGAAAGATCT 


5400 


TCAGGGGGCT 


AGAAATCTGT 


TGCTATGGGC 


CCTTCACCAA 


CATGCCCACA 


GATCAACTGG 


5460 


AATGGATGGT 


ACAGCTGTGT 


GGTGCTTCTG 


TGGTGAAGGA 


GCTTTCATCA 


TTCACCCTTG 


5520 


GCACAGGTGT 


CCACCCAATT 


GTGGTTGTGC 


AGCCAGATGC 


CTGGACAGAG 


GACAATGGCT 


5580 


TCCATGCAAT 


TGGGCAGATG 


TGTGAGGCAC 


CTGTGGTGAC 


CCGAGAGTGG 


GTGTTGGACA 


5640 


GTGTAGCACT 


CTACCAGTGC 


CAGGAGCTGG 


ACACCTACCT 


GATACCCCAG 


ATCCCCCACA 


5700 


GCCACTACTG 


A 










5711 
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(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Consensus sequence 

(iii) HYPOTHETICAL: YES 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GGTCANNNTG GTCNNNNNNN NNTGACC 27 



WO 98/23779 PCT/US97/21358 
WHAT IS CLAIMED IS: 

1 . A DNA molecule comprising a sequence shown as SEQ ID NO:2. 

2. A DNA molecule comprising a sequence complementary to SEQ ID NO:2. 

5 

3. An RNA molecule complementary to the DNA of claim 1 . 

4. An RNA molecule complementary to the DNA of claim 2. 

10 5. A nucleic acid consisting essentially of the sequence shown by SEQ ID NO:3, SEQ 
ID NO:4, SEQ ID NO:5 or SEQ ID NO:6. 

6. A nucleic acid consisting essentially of the sequence shown by SEQ ID NO:7 or SEQ 
ID NO:8. 

15 

7. A nucleic acid consisting essentially of the sequence shown by SEQ ID NO: 1 0. 

8. A method for specifically amplifying a portion of the BRCA1 gene or cDNA while not 
amplifying the LBRCA1 gene or cDNA, said method comprising performing a 

20 polymerase chain reaction using primers 120.2 and 120.3 and using cycling conditions 
consisting essentially of 45 seconds at 94°C, 60 seconds at 57°C, and 90 seconds at 
72°C. 

9. A method for specifically amplifying a portion of the BRCA1 gene or cDNA while not 
25 amplifying the LBRCA1 gene or cDNA, said method comprising performing a 

polymerase chain reaction using primers 214.3 and 42.2 and using a first set of 
cycling conditions followed by a second set of cycling conditions wherein said first 
set of cycling conditions consists essentially of cycles of 30 seconds at 94°C, 60 
second£"at 60°C, and 165 seconds at 72°C and said second set of cycling conditions 
30 consists essentially of cycles of 30 seconds at 94°C, 60 seconds at 55°C and 170 
seconds at 72°C. 

Hi 
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10. A method for specifically amplifying a portion of the BRCA1 gene or cDNA while not 
amplifying the LBRCAJ gene or cDNA, said method comprising performing a 
polymerase chain reaction using primers 42.2 and 120.2 and using cycling conditions 

5 consisting essentially of 30 seconds at 94°C, 60 seconds at 58°C and 90 seconds at 
72°C. 

1 1 . A method for specifically amplifying a portion of the 1A1.3B gene or cDNA while not 
amplifying the L1AL3B gene or cDNA, said method comprising performing a 

10 polymerase chain reaction using primers 225.1 and 225.4 and using cycling conditions 
consisting essentially of 45 seconds at 94°C, 60 seconds at 61°C and 90 seconds at 
72°C. 

12. A method for analyzing somatic tissue for deletion of at least a portion of BRCA1 
15 from one chromosome in a person who is heterozygous for a polymorphism, said 

method consisting of the following steps: 

(a) determining whether the person is heterozygous in gennline tissue for a specific 
polymorphism within BRCA1 or its promoter region; 

(b) determining whether the person is heterozygous in said somatic tissue for said 
20 specific polymorphism; and 

(c) comparing the zygosity of the polymorphism in said gennline tissue and said 
somatic tissue wherein: 

1) if the gennline tissue is heterozygous and the somatic tissue is heterozygous 
for the polymorphism then there has been no deletion of the polymorphic gene region; 
25 2) if the gennline tissue is heterozygous and the somatic tissue is not 

heterozygous then there has been a deletion of the polymorphic gene region; and 

3) if the gennline tissue is homozygous the assay is uninformative unless said 
somatic tissue is null for the polymorphism thereby indicating a loss of all copies of 
the gen£ region within the somatic tissue. 

30 
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13. The method of claim 12 wherein said polymorphism is the C/T polymorphism at base 
612ofSEQIDNO:l 

14. The method of claim 12 wherein said polymorphism is the AAC/AACAAC 
5 polymorphism at bases 980-982 of SEQ ID NO: 1 . 

15. The method of claim 12 wherein said polymorphism is the A/G polymorphism at base 
1723ofSEQIDNO:2. 

10 16. The method of claim 12 wherein said polymorphism is the A/G polymorphism at base 
2182 of SEQ ID NO^. 

17. A method for determining the copy number of BRCA1 genes within a human genome 
by using a quantitative polymerase chain reaction. 

15 

1 8. The method of claim 1 7 wherein PCR primers corresponding to a fragment of SEQ ID 
NO:2 or its complement are used. 

19. A method for determining the copy number and large-scale genomic structure of a 
20 human genomic region containing a BRCA1 promoter using pulsed-field gel 

electrophoresis. 

20. A method for specifically amplifying a target nucleic acid that comprises at least 25 
consecutive nucleotides of SEQ ID NO:l or its complement while not amplifying a 

25 second nucleic acid that comprises at least 25 consecutive nucleotides of SEQ ED 
NO:2 or its complement, wherein said method comprises performing a polymerase 
chain reaction using primers with 3' termini wherein when said primers hybridize to 
said target nucleic acid said 3* termini will be complementary to a strand of said target 
nucleicacid to which said primer hybridizes and wherein if said primers bind to said 

30 second nucleic acid said 3* termini will not be complementary to a strand of said 
second nucleic acid to which said primer binds. 
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21. The method of claim 20 wherein said 3' termini are defined as a single nucleotide 
which is at the ultimate 3' position of each primer. 

r 

5 22. The method of claim 20 wherein said 3' termini are defined as two nucleotides which 
are the final two nucleotides at the 3 * end of each primer. 

23. A nucleic acid comprising at least 10 consecutive nucleotides of SEQ ID NO:2 or its 
complement 

10 

24. A method of performing a polymerase chain reaction wherein said method uses 
primers that have a nucleotide sequence identical to a portion of SEQ ID NO:2 or its 
complement 

IS 25. Nucleic acid oligonucleotides useful as primers for a polymerase chain reaction 
wherein said oligonucleotides consist of a nucleic acid sequence that is identical to a 
portion of SEQ ID NO:2 or its complement 
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